Topmoumoute Online Natural Gradient Algorithm

نویسندگان

  • Nicolas Le Roux
  • Pierre-Antoine Manzagol
  • Yoshua Bengio
چکیده

Natural gradient is a gradient descent technique which uses the inverse of the covariance matrix of the gradient. Using the centrallimit theorem, we prove that it yields the direction that minimizes the probability of overfitting. However, its prohibitive computational cost makes it impractical for online training. Here, we present a new online version of the natural gradient which we coin TONGA (Topmoumoute Online Natural Gradient Algorithm).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Stochastic Quasi-Newton Method for Online Convex Optimization

We develop stochastic variants of the wellknown BFGS quasi-Newton optimization method, in both full and memory-limited (LBFGS) forms, for online optimization of convex functions. The resulting algorithm performs comparably to a well-tuned natural gradient descent but is scalable to very high-dimensional problems. On standard benchmarks in natural language processing, it asymptotically outperfor...

متن کامل

Matrix momentum for practical natural gradient learning

An on-line learning rule, based on the introduction of a matrix momentum term, is presented, aimed at alleviating the computational costs of standard natural gradient learning. The new rule, natural gradient matrix momentum, is analysed in the case of two-layer feed-forward neural network learning viamethods of statistical physics. It appears to provide a practical algorithm that performs as we...

متن کامل

LETTER Communicated by Sun - Ichi AmariOn \ Natural " Learning and Pruning in Multilayered PerceptronsTom

Several studies have shown that natural gradient descent for on-line learning is much more eecient than standard gradient descent. In this paper, we derive natural gradients in a slightly diierent manner and discuss implications for batch-mode learning and pruning, linking them to existing algorithms such as Levenberg-Marquardt optimization and optimal brain surgeon. The Fisher matrix plays an ...

متن کامل

Pii: S0165-1684(01)00146-3

In this paper, we study convergence and e ciency of the batch estimator and natural gradient algorithm for blind deconvolution. First, the blind deconvolution problem is formulated in the framework of a semiparametric model, and a family of estimating functions is derived for blind deconvolution. To improve the learning e ciency of the online algorithm, explicit standardized estimating function...

متن کامل

Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient

The parameter space of neural networks has a Riemannian metric structure. The natural Riemannian gradient should be used instead of the conventional gradient, since the former denotes the true steepest descent direction of a loss function in the Riemannian space. The behavior of the stochastic gradient learning algorithm is much more effective if the natural gradient is used. The present paper ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007